Free Factories: Unified Infrastructure for Data Intensive Web Services
نویسندگان
چکیده
We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.
منابع مشابه
Improving the flexibility of active grids through web services
Active Grids are a form of grid infrastructure where the grid network is active and programmable. These grids directly support applications with value added services such as data migration, compression, adaptation and monitoring. Services such as these are particularly important for eResearch applications which by their very nature are performance critical and data intensive. We propose an arch...
متن کاملBPEL-DT - Data-aware Extension of BPEL to Support Data-Intensive Service Applications
Aside from business processes, the service-oriented approach— currently realized with Web services and BPEL—should be utilizable for data-intensive applications as well. Fundamentally, data-intensive applications are characterized by (i) a sequence of functional operations processing large amounts of data and (ii) the delivery and transformation of huge data sets between those functional activi...
متن کاملWatershed Reanalysis Towards a National Cyberinfrastructure for Model-Data Integration
Reanalysis or retrospective analysis is the process of re-analyzing and assimilating climate and weather observations with the current modeling context. Reanalysis is an objective, quantitative method of synthesizing all sources of information (historical and real-time observations) within a unified framework. In this context, we propose a prototype for automated and virtualized web services so...
متن کاملA Controller Based Approach for Web Services Virtualized Instance Allocation
Few Service providers provide compute intensive and data intensive services over web platform; where in applications can be deployed on demand. These service providers usually employ machine virtualization for providing cost effective solution. At the time of infrastructure purchase, one may opt for a particular instance, assuming that this will satisfy the computational needs. Whereas consider...
متن کاملDistributed Modelling and Simulation for collaborative E-science in Grid Infrastructure
E-science is collaborative science that is made possible by the sharing across the Internet of resources that is often very compute intensive, often very data intensive and crosses organizational and administrative boundaries. The semantic grid annotates the grid with metadata describing the resources it makes available. Semantic grid aims to incorporate the advantages of the grid, semantic web...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the USENIX ... annual Technical Conference. USENIX Technical Conference
دوره 2008 شماره
صفحات -
تاریخ انتشار 2008